Transliteration Mining with Phonetic Conflation and Iterative Training
نویسنده
چکیده
This paper presents transliteration mining on the ACL 2010 NEWS workshop shared transliteration mining task data. Transliteration mining was done using a generative transliteration model applied on the source language and whose output was constrained on the words in the target language. A total of 30 runs were performed on 5 language pairs, with 6 runs for each language pair. In the presence of limited resources, the runs explored the use of phonetic conflation and iterative training of the transliteration model to improve recall. Using letter conflation improved recall by as much as 48%, with improvements in recall dwarfing drops in precision. Using iterative training improved recall, but often at the cost of significant drops in precision. The best runs typically used both letter conflation and iterative learning.
منابع مشابه
Phoneme-based Statistical Transliteration of Foreign Names for OOV Problem
Given a source language term, machine transliteration is to automatically generate the phonetic equivalents in a target language. It is useful in many cross language applications. Recently, there are increasing concerns about automatic transliteration, especially with languages with significant distinctions in their phonetic representations, e.g. English and Chinese. Despite many cross-language...
متن کاملStatistical models for unsupervised, semi-supervised and supervised transliteration mining
We present a generative model that efficiently mines transliteration pairs in a consistent fashion in three different settings, unsupervised, semi-supervised and supervised transliteration mining. The model interpolates two sub-models, one for the generation of transliteration pairs and one for the generation of non-transliteration pairs (i.e. noise). The model is trained on noisy unlabelled da...
متن کاملLanguage Independent Transliteration Mining System Using Finite State Automata Framework
We propose a Named Entities transliteration mining system using Finite State Automata (FSA). We compare the proposed approach with a baseline system that utilizes the Editex technique to measure the length-normalized phonetic based edit distance between the two words. We submitted three standard runs in NEWS2010 shared task and ranked first for English to Arabic (WM-EnAr) and obtained an Fmeasu...
متن کاملUnsupervised Named Entity Transliteration Using Temporal and Phonetic Correlation
In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics — and therefore share references to named entities — but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using t...
متن کاملMachine Transliteration of Names in Arabic Text under Consideration for Other Conferences (specify)? None Machine Transliteration of Names in Arabic Text
We present a transliteration algorithm based on sound and spelling mappings using nite state machines. The transliteration models can be trained on relatively small lists of names. We introduce a new spelling-based model that much more accurate than state-of-the-art phonetic-based models and can be trained on easier-to-obtain training data. We apply our transliteration algorithm to the translit...
متن کامل